Genome and whole-genome resequencing of Cinnamomum camphora elucidate its dominance in subtropical urban landscapes

Background Lauraceae is well known for its significant phylogenetic position as well as important economic and ornamental value; however, most evergreen species in Lauraceae are restricted to tropical regions. In contrast, camphor tree (Cinnamomum camphora) is the most dominant evergreen broadleaved tree in subtropical urban landscapes. Results Here, we present a high-quality reference genome of C. camphora and conduct comparative genomics between C. camphora and C. kanehirae. Our findings demonstrated the significance of key genes in circadian rhythms and phenylpropanoid metabolism in enhancing cold response, and terpene synthases (TPSs) improved defence response with tandem duplication and gene cluster formation in C. camphora. Additionally, the first comprehensive catalogue of C. camphora based on whole-genome resequencing of 75 accessions was constructed, which confirmed the crucial roles of the above pathways and revealed candidate genes under selection in more popular C. camphora, and indicated that enhancing environmental adaptation is the primary force driving C. camphora breeding and dominance. Conclusions These results decipher the dominance of C. camphora in subtropical urban landscapes and provide abundant genomic resources for enlarging the application scopes of evergreen broadleaved trees. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-023-01692-1.


Fig. S1.
Genome-wide analysis of chromatin interactions in the Cinnamomum camphora genome using Hi-C data.Hi-C reads were realigned back to the assembly and the mappings were converted to the dot intensity indicating the loci collocate in the nucleus.The resolution is 500 Kb.Fig. S3.Synteny analysis between Cinnamomum camphora genome assembly in this study and those in published studies.To facilitate the comparison, the genomes of our study and three published ones (Sun et al. 2022 [22]; Jiang et al. 2022 [23]; Wang et al. 2022 [24]) were assigned as Ccam_HB, Ccam_FJ, Ccam_GX, and Ccam_JX, respectively.(a) Syntenic comparison between C. camphora genome assembly in this study (Ccam_HB) with three published C. camphora genomes.(b) The syntenic depth between Ccam_HB and Ccam_JX. (c, d) Two synteny regions between Ccam_HB and Ccam_JX were randomly selected for visualization.

Fig. S2 .
Fig. S2.The genome size estimation of Cinnamomum camphora by flow cytometric analysis.(a) Flow cytometry estimation of C. camphora with three replicates using tomato (Solanum lycopersicum) as the internal standard (900 Mb).(b) Fluorescent intensity obtained in the experiment (n = 3) and the estimated genome size of C. camphora (687.08 ± 13.38 Mb).

Fig. S7 .
Fig. S7.Pfam analyses between Cinnamomum camphora and C. kanehirae.(a) Unique and shared Pfams between C. camphora, C. kanehirae, Liriodendron chinense, Amebralla trichopoda, and Arabidopsis thaliana.The number of Pfams is listed in each of the diagram components, and the total number for each plant is provided in parentheses.The top left panel of the Venn diagram is an image of C. camphora, and the top right panel is C. kanehirae.(b) Top 20 enriched GO terms of genes in unique Pfams of C. camphora.
Fig. S8.Weighted gene co-expression network analysis (WGCNA) of the transcriptomes.(a) Clustering dendrogram of genes and merged module colors.In the dendrogram, each leaf corresponds to a gene.(b) Trait-module relationships in the C. camphora transcriptomes.Each row represents a module eigengene; each column represents seed composition trait.Each cell contains the corresponding correlation and p value.The table is colour coded by correlation according to the color legend.

Fig. S9 .
Fig. S9.Coexpression network of a gene in the circadian rhythm pathway (RON3, Ccam01g03083).This gene is in unique Pfams of C. camphora and enriched in several GO terms.

Fig. S11 .
Fig. S11.Top 20 significantly expanded orthogroups in Cinnamomum camphora sorted by gene count ratio between C. camphora and C. kanehirae.For every orthogroup, a z-score was calculated for the corresponding abundance in each species.Only a z-score greater than 2.0 was considered significantly expanded in C. camphora.Arabidopsis gene function/family was presented on X-axis for each orthogroup.

Fig. S12 .
Fig. S12.The analyses of genes under positive selection between Cinnamomum camphora and its closely related species.(a) The gene number and significantly (p value < 0.05) enriched KEGG pathways of genes under positive selection (Ka/Ks＞1) between C. camphora and the 10 most closely related species to C. camphora based on the phylogenetic tree in Fig.1a.Ccam, C. camphora; Clon, C. longepaniculatum; Ckan, C. kanehirae; Cten, C. tenuipile; Salb, S. albidum; Stzu, S. tzumu; Cver, C. verum; Cbur, C. burmanni; Cchek, C. chekiangense; Cchen, C. chennii; Ccha, C. chago.The black dots indicate gene enrichment in corresponding KEGG pathways.(b) Expression patterns and corresponding function of genes in the most enriched gene family MYB.Only the sum of gene expression in three tissues greater than 1.0 is presented in this heatmap.

Fig. S13 .
Fig. S13.Expression patterns of CYP450 genes in different organs and cold acclimation treatments.Control (CK), plants without any treatment; CA2, 2-hour treatment under 4 °C cold acclimation; CA12, 12-hour treatment under 4 °C cold acclimation.Only the sum of gene expression in three tissues greater than 1.0 is presented in this heatmap.Genes marked with black dots were tandem repeats visualized in Fig. 2d.

Fig. S15 .
Fig. S15.Tandemly duplicated genes in Cinnamomum camphora are extremely significantly enriched in phenylpropanoid, flavonoid, and lignin-related pathway.Circles represent enriched GO terms and are color-coded according to the adjusted p-value.The line with an arrow means the hierarchical level between GO terms.

Fig. S16 .
Fig. S16.Top 20 enriched GO terms in molecular function of tandem-duplicated genes in Cinnamomum camphora.

Fig. S17 .
Fig. S17.Top 50 metabolites with highest contents in leaf, stem, and flower of Cinnamomum camphora.The most abundant metabolites are flavonoids.

Fig. S18 .
Fig. S18.Transcriptome analyses of differentially expressed genes (DEGs) with different cold treatments.(a) Venn diagram of DEGs obtained from any two treatments.Control (CK), plants without any treatment; CA2, 2-hour treatment under 4 °C cold acclimation; CA12, 12-hour treatment under 4 °C cold acclimation.(b) Gene expression patterns and significantly (padj.value) enriched KEGG pathways of specific DEGs obtained from any two treatments.

Fig. S19 .
Fig. S19.Gene expression in the phenylpropanoid metabolic pathway indicates the contribution to cold tolerance in Cinnamomum camphora.The name of an enzyme is in the same colour as that of its encoding genes.

Fig. S21 .
Fig. S21.TPS-related WGCNA module and GO enrichment of TPS co-expression genes.(a) The heatmap of all genes and the eigengene expression pattern in the blue module, which was significantly correlated with flowers.(b) Top 20 enriched GO terms in biological process of coexpressed genes with TPSs.

Fig. S22 .
Fig. S22.Volatile compound identification in leaf and stem of Cinnamomum camphora by GC-MS analysis.(a) Volatile compounds in leaf.(b) Volatile compounds in stem.Compounds shown in orange are monoterpenes, green are sesquiterpenes, and those in black are non-terpene compounds.

Fig. S24 .
Fig. S24.Terpenoid biosynthesis gene characterization in Cinnamomum camphora.(a) Subcellular localization and transient function validation of TPS-YFP in tobacco.(b) Subcellular localization and transient function validation of C. camphora TPS45 in tobacco.(c) Subcellular localization and transient function validation of C. camphora TPS79 in tobacco.Each group of up panels are images for subcellular localization of TPS-YFP fusion proteins in tobacco mesophyll cells.Each bottom panel is the chromatogram of volatile compounds detected in tobacco leaves by GC-MS analysis.

Fig. S25 .
Fig. S25.The phylogenetic tree constructed by representative single nucleotide polymorphisms (SNPs) obtained from Chr 1 of 75 resequencing Cinnamomum camphora accessions using C. kanehirae as the outgroup.Branches and accession IDs in green represent samples in group I, and those in orange represent samples in group II in Fig. 4a and Fig. 4b.Individuals of C. camphora greater than 100 years old are marked by black triangles.

Fig. S27 .
Fig. S27.Functional comparison analyses of Cinnamomum camphora genomes with high quality.(a) The number of pan-gene and core-gene families estimated based on pairwise gene family comparisons in 22 accessions (four genomes of C. camphora, one genome of C. kanehirae, one transcriptome of C. longepaniculatum, and 17 transcriptomes of C. camphora).Each black dot corresponds to a pan-or core-gene family size estimated by a particular combination.(b) A Venn analysis to obtain the private and shared genes of three C. camphora genomes.(c) KEGG enrichments of private genes in three C. camphora genomes.(d) KEGG enrichments of shared genes in three C. camphora genomes.(e) Comparisons of the Ka/Ks values of the private and shared genes in three C. camphora genomes.
The species tree based on amino acid sequences of identified single-copy orthogroups with a coalescent-based method from 22 samples with Cinnamomum kanehirae as the outgroup.Samples marked with red dots using genome files and ones with blue dots using de novo assembled transcriptome files.Bold capital letters indicate the sample collection site.GX represents Guangxi Province, JX represents Jiangxi Province, HB represents Hubei Province, ZJ represents Zhejiang Province, GD represents Guangdong Province.